Entry Name: "PLAUST-Liu-MC2"

VAST Challenge 2015
Mini-Challenge 2

 

 

Team Members:

Liu Bin, PLA University of Science and Technology, liubin_1977@hotmail.com Primary

Chen Gang, PLA University of Science and Technology, chengang391@126.com

Dong Kun, PLA University of Science and Technology, 3232214246@qq.com

Fang Lehong, PLA University of Science and Technology, 1586825402@qq.com

Student Team:  NO

 

Did you use data from both mini-challenges? NO

 

Analytic Tools Used:

Eagleyes was developed by the PLA University of Science and Technology MTDC 1006 Data Visualization class, taught Spring 2015 by Liu Bin, and used by the team for the challenge.

PyQT

Pandas

 

Approximately how many hours were spent working on this submission in total?

300

 

May we post your submission in the Visual Analytics Benchmark Repository after VAST Challenge 2015 is complete? YES

 

 

Video:

PLAUST-Liu-MC2.wmv

 

 

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Questions

 

MC2.1Identify those IDs that stand out for their large volumes of communication.  For each of these IDs

 

      a.      Characterize the communication patterns you see.

      b.      Based on these patterns, what do you hypothesize about these IDs?

 

Limit your response to no more than 4 images and 300 words.

 

 

Eagleyes, an interactive dataflow engine, is used for analysis. The "GroupByID" node in the dataflow (see Fig. 1 (a)) statistically aggregates the total communication volumes of each ID, showing the results in the window of "RowSelect" node(see Fig. 1 (b)). Obviously, the communication volumes of ID1278894 and 839736 are quite larger than other IDs.

The communication patterns of ID 1278894 and 839736 can be seen in Fig. 2~4. Based on the patterns observed, we infer that ID 1278894 is a broadcasting polling machine and ID 839736 is a ticket checking machine.

01.png

Figure 1. Workflow designed for challenge and results aggregated by "GroupByID" node.

 

02.png

Figure 2. Using "Group Barcode" node to show all IDs communicated with ID1278894. These IDs can be organized into one or more groups by setting the different filtering boundary based on their communication volumes. From the 2D code, we can see that the communication volumes of these IDs are almost same, so they can be grouped into one group, such as "Group 0" shown in Fig. 3.

 

03.png

Figure 3. Using "Time Falls" node to show when, where and with whom did the ID1278894 communicate. (a) highlights the "Group 0", that is, the grouping result of "Group Barcode" node shown in Fig.2, (b) and (c) highlight some of the IDs not in the group. From this figure, we can see that ID1278894 sends out messages to a group every other 5 minutes starting from 12:00 at noon for an hour, and starts again after one hour break until 21:00 and gets responses from the group in a few minutes. The members of the group are usually fixed except a few quitting or joining midway. The place of ID 1278894 is always located entry corridor in this process.

 

04.png

Figure 4. (a) and (b) shows the communication pattern of ID839736 by the same method as Fig.3, and (c) using "Timeline" node to show the communication volumes of ID839736 in the three days. From these figures, we can see that ID839736 receives messages from different IDs all the time and responds to those IDs in a few minutes. Those IDs almost cover all IDs appeared in the day. In particular, we notice that the communication volumes of ID839736 unusually increased at 12:00 in the third day, which provides important clues to answer questions of the MC2.3.

 

 

MC2.2Describe up to 10 communications patterns in the data. Characterize who is communicating, with whom, when and where. If you have more than 10 patterns to report, please prioritize those patterns that are most likely to relate to the crime.

 

Limit your response to no more than 10 images and 1000 words.

 

 

There are four common patterns (as shown in Fig. 5~8) in the communication data and through comparing mutually, three unusual modes are found (as shown in Fig. 9~14).

05.png

Figure 5. Common pattern 1: the IDs with huge communication volumes. The communication volumes is mainly composed of broadcast communication in groups. The members between groups may be overlapped.

 

06.png

Figure 6. Common pattern 2: the IDs with medium communication volumes. The number and activity of their joined groups decline somewhat.

 

07.png

Figure 7. Common pattern 3: the IDs with smaller communication volumes. The proportion of communication volumes with special ID, 1278894 for instance, is relatively large.

 

08.png

Figure 8. Common pattern 4: the IDs mainly with sporadic peer-to-peer communication.

 

09.png

Figure 9. Pattern 5: Isolated groups. The figure shows an isolated group made up of members 300315, 1932220, 32672, 98371, 125303, etc. All members only communicate within the group, except some special IDs such as ID1278894 and 839736. We can find this kind of groups in all the three days.

 

 

10.png

Figure 10. Continuing Pattern 5: Every isolated group only appeared in one of the three days, that is, the members of all isolated groups have no overlap. The figure shows three different isolated groups: "Group 1"made up of ID 32672, 98371, 125303, 140461, etc and only appeared in the first day, "Group 2" made up of ID 86922, 100461, 128881, 165079, etc and only appeared in the second day, and "Group 3" made up of ID 436, 2232, 119769, 120395, etc and only appeared in the third day.

 

11.png

Figure 11. Pattern 6: the communication volumes of some IDs fluctuate abnormally at some time. For instance, ID195725 was continually sending messages to the group and ID839736 in the 11:40~12:16 in the third day (other messages have been filtered by slide window of timeline and not been shown in the figure).

 

 

12.png

Figure 12. Pattern 7: some IDs only communicate with specific IDs and communicate with them extremely frequent. For instance, ID1149884 only communicated with ID 839736 and external ID, and the communication is particularly frequent in the wet land in the two time intervals, 12:54 ~13:50 and 14:47~15:10, in the third day.

 

13.png

Figure 13. ID1217381 similar to pattern 7.

 

14.png

Figure 14. ID1601276 similar to pattern 7.

 

 

MC2.3From this data, can you hypothesize when the crime was discovered?  Describe your rationale.

 

Limit your response to no more than 3 images and 300 words.

 

 

Based on the above discussion, we can see that something occurred at the third day noon in the wet land, so we try to find the earliest abnormal communication fluctuation of group or ID from communication data. Through observation, we find that an isolated group meets the above conditions (See Fig. 15 and 16), so we infer that the crime was discovered at 11:29 AM, June 8, 2014.

15.png

Figure 15. Using "Group Barcode" node to show the members of the isolated group with the earliest abnormal communication fluctuation.

 

16.png

Figure 16. Using "Time Falls" node to show when and where the communication volumes of the group fluctuate abnormally.